Who searched My Blog? Keyword Research Tutorials


Objectives


  • I have a blog about life in the US as a Korean (http://selim.tistory.com).
    • The topics of the postings are diverse about life information, traveling US, cooking, grad-life, self-care, and Urbana-Champaign(University of Illinois).
  • I gathered user data for around two months (2021.9.1 - 2021.11.15) through Google analytics.
  • User information and keywords were analyzed to understand and find users' interests.

Methods


  • User information (i.e., country, region) and searched keywords (from 2021.9.1 - 2021.11.15) were extracted from Google analytics tools.
  • For descriptive analysis,
    • Using Tableau, I plotted user information by pageviews and retention time.
  • For keyword research,
    • Using Text Mining methods, general topics were analyzed by the most frequently searched keywords (separately in Korean and English)

Summary of Findings


  • People viewed my blog nearly all over the world.
    • Top 2 countries were South Korea and U.S.A.
  • Because most postings were written in Korean, most keywords were searched in Korean.
  • The keywords searched most were similar between users in South Korea and U.S.A.
  • Top 3 topics of User's Interests:
    • 1st topic was local information (such as, 'USA', 'Chicago', 'San Jose', 'LA', or 'Jamsil'(a district in Seoul)).
    • 2nd topic was product reviews ('Recommdation', 'Review', 'Instant Pot', 'US grocery market', 'Trader Joe's', 'Shein').
    • 3nd topic was diet/health ('Keto', 'LCHF', 'Review', 'diet', 'stevia').

Word Clouds


  • You can find the codes for making them in the following sections of this webpage!
In [2]:
from IPython.display import Image 
Image("clouds.png")
Out[2]:

Descriptive Analysis for Understanding Users (enabled by Tableau)

  • Using Tableau, I plotted user's information.

Geographical Information of Users

  • Users viewed my blog all over the world from all continents.
  • Top 2 regions were Seoul and California.
  • Top 2 countries were South Korea and U.S.A.

Time, Country and Page Views

  • The time users viewed my blog reflects the time zones they are in.

Source, Country, and Page Views

  • "Google Search" is the most frequent and common method that users were getting into my blog, regardless of country.
  • Korean users used "Naver" for the second most, as "Naver" is the most popular search engine in South Korea.
  • Daum (another Korean search engine) and Youtube are followed.

Searched Keywords by Pageviews and Average Time Retention

  • The most frequent word is "미국", which means "USA" in Korean.
  • However, the list of searched keywords were dispersed as various topics.
  • Therefore, I investigated top topics searched as keywords through the text mining methods from the keywords lists saved as .csv files.

Keywords Research (Text Mining)


  • The keywords that were searched most frequently were examined using text mining methods (konlpy for Korean keywords and ntlk/word cloud for English keywords).

  • Using wordcloud, the most frequent keywords were visualized separately in Korean and English.

1. Finding topics from Korean keywords

In [17]:
# Import required libraries
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
In [132]:
df = pd.read_csv('keywords.csv')
In [19]:
df.head()
Out[19]:
Keyword
0 1g 물건
1 1살 유나이티드 비즈니스 클래스
2 1세대 생리컵
3 3m 94
4 3m 94마스크
In [20]:
df.shape
Out[20]:
(2478, 1)
In [21]:
import re

def apply_regular_expression(text):
    hangul = re.compile('[^ ㄱ-ㅣ 가-힣]')  
    result = hangul.sub('', text)  
    return result
In [22]:
apply_regular_expression(df['Keyword'][0])
Out[22]:
' 물건'
In [24]:
from konlpy.tag import Okt
from collections import Counter

okt = Okt()  
nouns = okt.nouns(apply_regular_expression(df['Keyword'][0]))
nouns
Out[24]:
['물건']
In [ ]:
# Create corpus
corpus = "".join(df['Keyword'].tolist())
In [ ]:
# Apply regular expressions
apply_regular_expression(corpus)
# Extract Tokens form Corpus
nouns = okt.nouns(apply_regular_expression(corpus))
In [28]:
counter = Counter(nouns)
In [29]:
counter.most_common(10)
Out[29]:
[('미국', 391),
 ('팟', 123),
 ('인스턴트', 113),
 ('시카고', 72),
 ('추천', 72),
 ('한국', 71),
 ('잠실', 61),
 ('산호세', 58),
 ('마트', 57),
 ('여자', 53)]
In [30]:
#Delete Stopwords
available_counter = Counter({x: counter[x] for x in counter if len(x) > 1})
available_counter.most_common(10)
Out[30]:
[('미국', 391),
 ('인스턴트', 113),
 ('시카고', 72),
 ('추천', 72),
 ('한국', 71),
 ('잠실', 61),
 ('산호세', 58),
 ('마트', 57),
 ('여자', 53),
 ('어깨', 53)]

Most Common Keywords in Korean (N of Times Searched)

  • 'USA' (391)
  • 'Instant' (113)
  • 'Chicago' (72)
  • 'Recommand' (72)
  • 'Korea' (71)
  • 'Jamsil' (a district of Seoul) (61)
  • 'San Jose' (58)
  • Market (57)
  • Women (53)
  • Shoulder (53)
In [31]:
#Delete Stopwords from an external dictionary 
stopwords = pd.read_csv("https://raw.githubusercontent.com/yoonkt200/FastCampusDataset/master/korean_stopwords.txt").values.tolist()
In [42]:
!pip install wordcloud
Requirement already satisfied: wordcloud in /usr/local/lib/python3.7/dist-packages (1.5.0)
Requirement already satisfied: numpy>=1.6.1 in /usr/local/lib/python3.7/dist-packages (from wordcloud) (1.19.5)
Requirement already satisfied: pillow in /usr/local/lib/python3.7/dist-packages (from wordcloud) (7.1.2)
In [33]:
import nltk
nltk.download('stopwords')
nltk.download('punkt')
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
Out[33]:
True
In [43]:
from wordcloud import WordCloud, STOPWORDS

import numpy as np
import nltk
from PIL import Image
In [34]:
text = open('keywords.csv').read()
circle = np.array(Image.open('circle_1.jpeg'))

stopwords = set(STOPWORDS)
stopwords.add("said")
In [35]:
print('The number of stopwords:',len(nltk.corpus.stopwords.words('english')))
print(nltk.corpus.stopwords.words('english')[:40])
The number of stopwords: 179
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this']
In [36]:
plt.figure(figsize=(8,8))
plt.imshow(circle, cmap=plt.cm.gray, interpolation='bilinear')
plt.axis('off')
plt.show()
In [46]:
# Install Korean fonts
!apt-get update -qq
!apt-get install fonts-nanum* -qq
Selecting previously unselected package fonts-nanum.
(Reading database ... 155222 files and directories currently installed.)
Preparing to unpack .../fonts-nanum_20170925-1_all.deb ...
Unpacking fonts-nanum (20170925-1) ...
Selecting previously unselected package fonts-nanum-eco.
Preparing to unpack .../fonts-nanum-eco_1.000-6_all.deb ...
Unpacking fonts-nanum-eco (1.000-6) ...
Selecting previously unselected package fonts-nanum-extra.
Preparing to unpack .../fonts-nanum-extra_20170925-1_all.deb ...
Unpacking fonts-nanum-extra (20170925-1) ...
Selecting previously unselected package fonts-nanum-coding.
Preparing to unpack .../fonts-nanum-coding_2.5-1_all.deb ...
Unpacking fonts-nanum-coding (2.5-1) ...
Setting up fonts-nanum-extra (20170925-1) ...
Setting up fonts-nanum (20170925-1) ...
Setting up fonts-nanum-coding (2.5-1) ...
Setting up fonts-nanum-eco (1.000-6) ...
Processing triggers for fontconfig (2.12.6-0ubuntu2) ...
In [47]:
text = open('keywords.csv').read()
circle = np.array(Image.open('circle_1.jpeg'))

stopwords = set(STOPWORDS)
stopwords.add("said")
fontpath = '/usr/share/fonts/truetype/nanum/NanumBarunGothic.ttf'
wc = WordCloud(font_path = fontpath, background_color='white', max_words=2000, mask=circle,
              stopwords = stopwords)

wc = wc.generate(text)
wc.words_
Out[47]:
{'미국': 1.0,
 '인스턴트팟': 0.3463203463203463,
 '시카고': 0.23809523809523808,
 '산호세': 0.22943722943722944,
 '추천': 0.20346320346320346,
 '가격': 0.12987012987012986,
 '저탄고지': 0.12987012987012986,
 '잠실새내': 0.12554112554112554,
 '후기': 0.12121212121212122,
 '스테비아': 0.12121212121212122,
 '맛집': 0.12121212121212122,
 '미국 마트': 0.11688311688311688,
 '트레이더조': 0.11255411255411256,
 '미국에서': 0.10822510822510822,
 '노브랜드': 0.1038961038961039,
 '트레이더스': 0.1038961038961039,
 '월마트': 0.09956709956709957,
 '샌프란시스코': 0.09523809523809523,
 '한국에서': 0.08658008658008658,
 '키토': 0.08225108225108226,
 '키토제닉': 0.08225108225108226,
 '영양제': 0.07792207792207792,
 '전도연': 0.07792207792207792,
 'trader joe': 0.07792207792207792,
 '어깨 넓은': 0.07792207792207792,
 '요리': 0.0735930735930736,
 '혼밥': 0.0735930735930736,
 '코스트코': 0.0735930735930736,
 '호텔': 0.06926406926406926,
 '다이어트': 0.06926406926406926,
 '메뉴': 0.06493506493506493,
 '에리스리톨': 0.06493506493506493,
 '포텐시에이터': 0.06493506493506493,
 '일리노이': 0.06493506493506493,
 '인스턴트': 0.06060606060606061,
 '여자': 0.05627705627705628,
 'shein': 0.05627705627705628,
 '추천템': 0.05627705627705628,
 '원피스': 0.05627705627705628,
 '떡대': 0.05627705627705628,
 '어바나 샴페인': 0.05627705627705628,
 '캘리포니아': 0.05194805194805195,
 '한식': 0.05194805194805195,
 '넓은 여자': 0.05194805194805195,
 '물건 한국에서': 0.05194805194805195,
 '비교': 0.047619047619047616,
 'tea': 0.047619047619047616,
 '커피': 0.047619047619047616,
 '라운지': 0.047619047619047616,
 '독일': 0.047619047619047616,
 '아마존': 0.047619047619047616,
 '잠실': 0.047619047619047616,
 '인팟': 0.047619047619047616,
 '생리컵': 0.04329004329004329,
 '체인점': 0.04329004329004329,
 'la': 0.04329004329004329,
 '여행': 0.04329004329004329,
 '고구마': 0.04329004329004329,
 '어바나샴페인': 0.04329004329004329,
 '차전자피': 0.04329004329004329,
 '종로5가': 0.04329004329004329,
 '유나이티드': 0.03896103896103896,
 '감기약': 0.03896103896103896,
 '스테이크': 0.03896103896103896,
 'splenda': 0.03896103896103896,
 '해외에서': 0.03896103896103896,
 '만들기': 0.03896103896103896,
 '몇년': 0.03896103896103896,
 '미국마트': 0.03896103896103896,
 '떡볶이': 0.03896103896103896,
 '보령약국': 0.03896103896103896,
 '새내역': 0.03896103896103896,
 '잠실새내역': 0.03896103896103896,
 '석박사 통합과정': 0.03896103896103896,
 '식빵': 0.03463203463203463,
 'la공항': 0.03463203463203463,
 '수집보류': 0.03463203463203463,
 '삼겹살': 0.03463203463203463,
 '파는곳': 0.03463203463203463,
 '대학': 0.03463203463203463,
 '저렴이': 0.03463203463203463,
 '로스엔젤레스': 0.03463203463203463,
 '롱혼': 0.03463203463203463,
 '맥도날드': 0.03463203463203463,
 '본죽': 0.03463203463203463,
 '생리': 0.03463203463203463,
 '혈당': 0.03463203463203463,
 '자가격리': 0.03463203463203463,
 '바비브라운 토스트': 0.03463203463203463,
 '타피오카 전분': 0.03463203463203463,
 '자가격리 문의처': 0.03463203463203463,
 '음료수': 0.030303030303030304,
 '르네상스': 0.030303030303030304,
 '문의처': 0.030303030303030304,
 '소고기': 0.030303030303030304,
 '김치전': 0.030303030303030304,
 '냉동': 0.030303030303030304,
 '약국': 0.030303030303030304,
 '헤모콤': 0.030303030303030304,
 '비싼': 0.030303030303030304,
 '대한항공': 0.030303030303030304,
 '과자': 0.030303030303030304,
 '불고기감': 0.030303030303030304,
 '비비고': 0.030303030303030304,
 '생크림': 0.030303030303030304,
 '실리콘밸리': 0.030303030303030304,
 '어바나': 0.030303030303030304,
 '타코벨': 0.030303030303030304,
 '홀푸드': 0.030303030303030304,
 '상비약': 0.030303030303030304,
 '좋은': 0.030303030303030304,
 '성분': 0.030303030303030304,
 '밥집': 0.030303030303030304,
 '샌프란': 0.030303030303030304,
 '오트파이버': 0.030303030303030304,
 '이효리': 0.030303030303030304,
 '종합운동장역': 0.030303030303030304,
 '별을 쏘다': 0.030303030303030304,
 '여자 코디': 0.030303030303030304,
 '석박사 통합': 0.030303030303030304,
 '코디 어깨': 0.030303030303030304,
 '드레스': 0.025974025974025976,
 '파스타': 0.025974025974025976,
 '스타벅스': 0.025974025974025976,
 'romwe': 0.025974025974025976,
 '강남역': 0.025974025974025976,
 '사기': 0.025974025974025976,
 '식품': 0.025974025974025976,
 '눈썹': 0.025974025974025976,
 '대학원': 0.025974025974025976,
 '리디북스': 0.025974025974025976,
 '코디': 0.025974025974025976,
 '라비두스': 0.025974025974025976,
 '비용': 0.025974025974025976,
 '식당': 0.025974025974025976,
 '샴페인': 0.025974025974025976,
 '석박사통합': 0.025974025974025976,
 '청심환': 0.025974025974025976,
 '파는': 0.025974025974025976,
 '피로회복제': 0.025974025974025976,
 '헤모콤액': 0.025974025974025976,
 '티스토리': 0.025974025974025976,
 '삼계탕': 0.025974025974025976,
 '레시피': 0.025974025974025976,
 '싸게': 0.025974025974025976,
 '설탕': 0.025974025974025976,
 '신천역': 0.025974025974025976,
 '아메리칸항공': 0.025974025974025976,
 '풍납동': 0.025974025974025976,
 'whole earth': 0.025974025974025976,
 '김치찜 인스턴트팟': 0.025974025974025976,
 '진저 바비브라운': 0.025974025974025976,
 '물건': 0.021645021645021644,
 '3m': 0.021645021645021644,
 '티백': 0.021645021645021644,
 '쇼핑': 0.021645021645021644,
 '소스': 0.021645021645021644,
 'congress': 0.021645021645021644,
 '한국': 0.021645021645021644,
 '직구': 0.021645021645021644,
 '요시노야': 0.021645021645021644,
 '달고나': 0.021645021645021644,
 '화장품': 0.021645021645021644,
 '다운타운': 0.021645021645021644,
 'zzzquil': 0.021645021645021644,
 '가성비': 0.021645021645021644,
 '상체': 0.021645021645021644,
 '감바스': 0.021645021645021644,
 '저혈압': 0.021645021645021644,
 '탈모': 0.021645021645021644,
 '상의': 0.021645021645021644,
 '넓은어깨': 0.021645021645021644,
 '왁싱': 0.021645021645021644,
 '뉴욕': 0.021645021645021644,
 '대호': 0.021645021645021644,
 '마라탕': 0.021645021645021644,
 '음식': 0.021645021645021644,
 '철분': 0.021645021645021644,
 '석박사': 0.021645021645021644,
 '맛있는': 0.021645021645021644,
 '그릇': 0.021645021645021644,
 '유학': 0.021645021645021644,
 '쇼핑몰': 0.021645021645021644,
 '석박사통합과정': 0.021645021645021644,
 '심리학': 0.021645021645021644,
 '입국': 0.021645021645021644,
 '주방용품': 0.021645021645021644,
 '바나나리퍼블릭': 0.021645021645021644,
 '니트': 0.021645021645021644,
 '산타크루즈': 0.021645021645021644,
 '서울': 0.021645021645021644,
 '홍콩반점': 0.021645021645021644,
 '하는': 0.021645021645021644,
 '수육': 0.021645021645021644,
 '아이허브': 0.021645021645021644,
 '아몬드가루': 0.021645021645021644,
 '인천공항': 0.021645021645021644,
 '전주': 0.021645021645021644,
 '인스탄트팟': 0.021645021645021644,
 '카레': 0.021645021645021644,
 '조성진': 0.021645021645021644,
 '종로약국': 0.021645021645021644,
 '칙펠레': 0.021645021645021644,
 '트레이더': 0.021645021645021644,
 'bagel sesame': 0.021645021645021644,
 '생리컵 flex': 0.021645021645021644,
 'jj house': 0.021645021645021644,
 'la 공항': 0.021645021645021644,
 '명랑 핫도그': 0.021645021645021644,
 '미국 인터넷': 0.021645021645021644,
 '시카고 근교': 0.021645021645021644,
 '커피랑 도서관': 0.021645021645021644,
 '일리노이 샴페인': 0.021645021645021644,
 '건물': 0.017316017316017316,
 'azo': 0.017316017316017316,
 'beef': 0.017316017316017316,
 '공항': 0.017316017316017316,
 '인앤아웃': 0.017316017316017316,
 '햄버거': 0.017316017316017316,
 '귀걸이': 0.017316017316017316,
 'zaful': 0.017316017316017316,
 '가을': 0.017316017316017316,
 '부작용': 0.017316017316017316,
 '사이즈': 0.017316017316017316,
 '경주': 0.017316017316017316,
 '된장찌개': 0.017316017316017316,
 '응급실': 0.017316017316017316,
 '설탕대체제': 0.017316017316017316,
 '버거': 0.017316017316017316,
 '저탄': 0.017316017316017316,
 '카스타드': 0.017316017316017316,
 '예쁜': 0.017316017316017316,
 '카페': 0.017316017316017316,
 '찜질방': 0.017316017316017316,
 '방법': 0.017316017316017316,
 '보건소': 0.017316017316017316,
 '인천': 0.017316017316017316,
 '공장': 0.017316017316017316,
 '가방': 0.017316017316017316,
 '이용': 0.017316017316017316,
 '부대찌개': 0.017316017316017316,
 '레토르트': 0.017316017316017316,
 '쉐도우': 0.017316017316017316,
 '마트': 0.017316017316017316,
 '바비인형': 0.017316017316017316,
 '명랑핫도그': 0.017316017316017316,
 '해변': 0.017316017316017316,
 '장학금': 0.017316017316017316,
 '원두': 0.017316017316017316,
 '보세': 0.017316017316017316,
 '한국식당': 0.017316017316017316,
 '위치': 0.017316017316017316,
 '조선옥': 0.017316017316017316,
 '킹스파': 0.017316017316017316,
 '오설록': 0.017316017316017316,
 '자취': 0.017316017316017316,
 '정신과': 0.017316017316017316,
 '주유소': 0.017316017316017316,
 '제품': 0.017316017316017316,
 '편집샵': 0.017316017316017316,
 '미국대학': 0.017316017316017316,
 '타피오카': 0.017316017316017316,
 '바나프레소': 0.017316017316017316,
 '패션': 0.017316017316017316,
 '비맥스액티브': 0.017316017316017316,
 '액상': 0.017316017316017316,
 '종류': 0.017316017316017316,
 '설사': 0.017316017316017316,
 '빈속에': 0.017316017316017316,
 '새내': 0.017316017316017316,
 '갈비': 0.017316017316017316,
 '수리산': 0.017316017316017316,
 '생리통': 0.017316017316017316,
 '아이스크림': 0.017316017316017316,
 '정리': 0.017316017316017316,
 '천호': 0.017316017316017316,
 '송파구': 0.017316017316017316,
 '해외입국': 0.017316017316017316,
 '트루비아': 0.017316017316017316,
 '치즈케익': 0.017316017316017316,
 '체형': 0.017316017316017316,
 '어깨넓은여자': 0.017316017316017316,
 '우황청심환': 0.017316017316017316,
 '유나이티드항공': 0.017316017316017316,
 '만두': 0.017316017316017316,
 '냉동만두': 0.017316017316017316,
 '카레여왕': 0.017316017316017316,
 '일식집': 0.017316017316017316,
 '종합운동장': 0.017316017316017316,
 '천호역': 0.017316017316017316,
 '티라미수': 0.017316017316017316,
 '파리바게트': 0.017316017316017316,
 'crate barrel': 0.017316017316017316,
 'everything bagel': 0.017316017316017316,
 'Eyebuydirect eyebuydirect': 0.017316017316017316,
 'flex cup': 0.017316017316017316,
 'Kicking Horse': 0.017316017316017316,
 'three sisters': 0.017316017316017316,
 '비맥스 액티브': 0.017316017316017316,
 '눈썹 왁싱': 0.017316017316017316,
 '엄마랑 서울': 0.017316017316017316,
 '잠실새내 스터디카페': 0.017316017316017316,
 '오헤어 공항': 0.017316017316017316,
 '아메리칸 항공': 0.017316017316017316,
 '어깨넓은 여자': 0.017316017316017316,
 '에반 윌리엄스': 0.017316017316017316,
 '문의처 코로나': 0.017316017316017316,
 '한국물건 해외에서': 0.017316017316017316,
 '한국에서 미국으로': 0.017316017316017316,
 '본사': 0.012987012987012988,
 'asos': 0.012987012987012988,
 'seasoning': 0.012987012987012988,
 'bai': 0.012987012987012988,
 'banza': 0.012987012987012988,
 'steak': 0.012987012987012988,
 'chuck': 0.012987012987012988,
 'rib': 0.012987012987012988,
 '레스토랑': 0.012987012987012988,
 '포틀랜드': 0.012987012987012988,
 'gap': 0.012987012987012988,
 '웨딩드레스': 0.012987012987012988,
 'pappy': 0.012987012987012988,
 'chip': 0.012987012987012988,
 'rishi': 0.012987012987012988,
 'keto': 0.012987012987012988,
 '칼로리': 0.012987012987012988,
 '몽크프룻': 0.012987012987012988,
 '해장국': 0.012987012987012988,
 '갤포스': 0.012987012987012988,
 '화단에': 0.012987012987012988,
 '바디스캔': 0.012987012987012988,
 '쿠키': 0.012987012987012988,
 '데이트': 0.012987012987012988,
 '격리': 0.012987012987012988,
 '경옥고': 0.012987012987012988,
 '미국입국': 0.012987012987012988,
 '직판': 0.012987012987012988,
 '어지럼증': 0.012987012987012988,
 '사야할': 0.012987012987012988,
 '끼리': 0.012987012987012988,
 '네이버': 0.012987012987012988,
 '웹마스터': 0.012987012987012988,
 '웹마스터도구': 0.012987012987012988,
 '등갈비': 0.012987012987012988,
 '납작당면': 0.012987012987012988,
 '마일리지': 0.012987012987012988,
 '덕소': 0.012987012987012988,
 '초콜릿': 0.012987012987012988,
 '사올물건': 0.012987012987012988,
 '고추장': 0.012987012987012988,
 '쌈장': 0.012987012987012988,
 '레고': 0.012987012987012988,
 '로라메르시에': 0.012987012987012988,
 '토스트': 0.012987012987012988,
 '롯데타워': 0.012987012987012988,
 '루이빌': 0.012987012987012988,
 '리터': 0.012987012987012988,
 '신라면': 0.012987012987012988,
 '부위': 0.012987012987012988,
 '장터국밥': 0.012987012987012988,
 '스터디': 0.012987012987012988,
 '무기력증': 0.012987012987012988,
 '고기': 0.012987012987012988,
 '기숙사': 0.012987012987012988,
 '학식': 0.012987012987012988,
 '대학교': 0.012987012987012988,
 '사이트': 0.012987012987012988,
 '프랜차이즈': 0.012987012987012988,
 '타이레놀': 0.012987012987012988,
 '반찬그릇': 0.012987012987012988,
 '전복죽': 0.012987012987012988,
 '산타클라라': 0.012987012987012988,
 '가는법': 0.012987012987012988,
 '시골': 0.012987012987012988,
 '콘그레스': 0.012987012987012988,
 '안경': 0.012987012987012988,
 '인터넷쇼핑몰': 0.012987012987012988,
 '세인트루이스': 0.012987012987012988,
 '사가야': 0.012987012987012988,
 '미국대학원': 0.012987012987012988,
 '염색약': 0.012987012987012988,
 '한식집': 0.012987012987012988,
 '쉬인': 0.012987012987012988,
 '위스키': 0.012987012987012988,
 '배아픔': 0.012987012987012988,
 '보드워크': 0.012987012987012988,
 '별을쏘다': 0.012987012987012988,
 '피임약': 0.012987012987012988,
 '보세옷': 0.012987012987012988,
 '블로그': 0.012987012987012988,
 '비에이블': 0.012987012987012988,
 '빈속': 0.012987012987012988,
 '라면': 0.012987012987012988,
 '산타나로우': 0.012987012987012988,
 '살찌는': 0.012987012987012988,
 '상체발달': 0.012987012987012988,
 '스터디카페': 0.012987012987012988,
 '식이섬유': 0.012987012987012988,
 '근교': 0.012987012987012988,
 '선물용': 0.012987012987012988,
 '성북동': 0.012987012987012988,
 '세종': 0.012987012987012988,
 '세종왁싱': 0.012987012987012988,
 '수입': 0.012987012987012988,
 'via': 0.012987012987012988,
 '스플렌다': 0.012987012987012988,
 '슬로우앤드': 0.012987012987012988,
 '탄수': 0.012987012987012988,
 '복용': 0.012987012987012988,
 '샤브샤브': 0.012987012987012988,
 '용산': 0.012987012987012988,
 '에어프라이어': 0.012987012987012988,
 '코로나': 0.012987012987012988,
 '알디': 0.012987012987012988,
 '어깨넓은': 0.012987012987012988,
 '생활': 0.012987012987012988,
 '엘에이': 0.012987012987012988,
 '여름쿨톤': 0.012987012987012988,
 '연어': 0.012987012987012988,
 '가공식품': 0.012987012987012988,
 '철분제': 0.012987012987012988,
 '사와야': 0.012987012987012988,
 '사갈': 0.012987012987012988,
 '고구마찌기': 0.012987012987012988,
 '인스턴트팟으로': 0.012987012987012988,
 '독서실': 0.012987012987012988,
 '커피랑도서관': 0.012987012987012988,
 '점심': 0.012987012987012988,
 '팬텍': 0.012987012987012988,
 '컵누들': 0.012987012987012988,
 '트레이더스조': 0.012987012987012988,
 '판도라': 0.012987012987012988,
 '풀무원': 0.012987012987012988,
 'seasoning blend': 0.012987012987012988,
 'bbq rub': 0.012987012987012988,
 'evan williams': 0.012987012987012988,
 'Horse Coffee': 0.012987012987012988,
 'Coffee three': 0.012987012987012988,
 'monk fruit': 0.012987012987012988,
 '예쁜 카페': 0.012987012987012988,
 '코디 떡대있는': 0.012987012987012988,
 '로라메르시에 진저': 0.012987012987012988,
 '브라질리언 왁싱': 0.012987012987012988,
 '실리콘 밸리': 0.012987012987012988,
 '인터넷 쇼핑몰': 0.012987012987012988,
 '본죽 전복죽': 0.012987012987012988,
 '유니언 스테이션': 0.012987012987012988,
 '씨네드쉐프 상품권': 0.012987012987012988,
 '데이트 엄마랑': 0.012987012987012988,
 '오트 파이버': 0.012987012987012988,
 '한국에서 미국에': 0.012987012987012988,
 '비즈니스': 0.008658008658008658,
 '94마스크': 0.008658008658008658,
 'kf94': 0.008658008658008658,
 '응급': 0.008658008658008658,
 '웨딩': 0.008658008658008658,
 'avec': 0.008658008658008658,
 '크랜베리': 0.008658008658008658,
 '음료': 0.008658008658008658,
 'bbq': 0.008658008658008658,
 '불고기': 0.008658008658008658,
 '오가닉': 0.008658008658008658,
 'caltrain': 0.008658008658008658,
 'Plaza': 0.008658008658008658,
 'cu': 0.008658008658008658,
 '구독': 0.008658008658008658,
 'DivaCup': 0.008658008658008658,
 '주차': 0.008658008658008658,
 'extra': 0.008658008658008658,
 'flexcup': 0.008658008658008658,
 'FLEX생리컵': 0.008658008658008658,
 'house': 0.008658008658008658,
 'jjshouse': 0.008658008658008658,
 '르네상스호텔': 0.008658008658008658,
 'la르네상스호텔': 0.008658008658008658,
 '통관': 0.008658008658008658,
 'ph': 0.008658008658008658,
 'KETTLE': 0.008658008658008658,
 'Pyure': 0.008658008658008658,
 '중국': 0.008658008658008658,
 '시즈닝': 0.008658008658008658,
 'Naturals': 0.008658008658008658,
 '제로': 0.008658008658008658,
 'antioxidant': 0.008658008658008658,
 'vrew': 0.008658008658008658,
 '가을웜톤': 0.008658008658008658,
 '샤브': 0.008658008658008658,
 '코코넛': 0.008658008658008658,
 '꽃나무': 0.008658008658008658,
 '마음챙김': 0.008658008658008658,
 '걸스카우트': 0.008658008658008658,
 '겔포스': 0.008658008658008658,
 '끼리한우': 0.008658008658008658,
 '과호흡': 0.008658008658008658,
 '귀리가루': 0.008658008658008658,
 '귀신': 0.008658008658008658,
 '기립성': 0.008658008658008658,
 '김치찜': 0.008658008658008658,
 '상품': 0.008658008658008658,
 '있는': 0.008658008658008658,
 '판매처': 0.008658008658008658,
 '먹는법': 0.008658008658008658,
 '막걸리': 0.008658008658008658,
 '사가는': 0.008658008658008658,
 '소불고기감': 0.008658008658008658,
 '어울리는': 0.008658008658008658,
 '네덜란드': 0.008658008658008658,
 '드럭스토어': 0.008658008658008658,
 '암스테르담': 0.008658008658008658,
 'la갈비': 0.008658008658008658,
 '간식': 0.008658008658008658,
 '샐러드': 0.008658008658008658,
 '자일리톨': 0.008658008658008658,
 '녹십자': 0.008658008658008658,
 '종합비타민': 0.008658008658008658,
 '놀숲': 0.008658008658008658,
 '굴방': 0.008658008658008658,
 '다듬기': 0.008658008658008658,
 '모양': 0.008658008658008658,
 '눈썹왁싱': 0.008658008658008658,
 '제거': 0.008658008658008658,
 '결혼식': 0.008658008658008658,
 '단호박': 0.008658008658008658,
 '치즈': 0.008658008658008658,
 '그라탕': 0.008658008658008658,
 '구경': 0.008658008658008658,
 '크림': 0.008658008658008658,
 '리터스포트': 0.008658008658008658,
 '유로파파크': 0.008658008658008658,
 '캔들워머': 0.008658008658008658,
 '드라마': 0.008658008658008658,
 '떡국': 0.008658008658008658,
 '사람': 0.008658008658008658,
 '야채죽': 0.008658008658008658,
 '로션': 0.008658008658008658,
 '음영': 0.008658008658008658,
 '루루스': 0.008658008658008658,
 '마트에': 0.008658008658008658,
 '파나요': 0.008658008658008658,
 '만둣국': 0.008658008658008658,
 '힘빠짐': 0.008658008658008658,
 '만화방': 0.008658008658008658,
 '말벌': 0.008658008658008658,
 '이유': 0.008658008658008658,
 '치폴레': 0.008658008658008658,
 '멕시칸': 0.008658008658008658,
 '핫도그': 0.008658008658008658,
 '선물': 0.008658008658008658,
 '모벌디': 0.008658008658008658,
 '목살': 0.008658008658008658,
 '무기력': 0.008658008658008658,
 '팔기': 0.008658008658008658,
 '브랜드': 0.008658008658008658,
 '곤약': 0.008658008658008658,
 '놀이공원': 0.008658008658008658,
 '석박': 0.008658008658008658,
 '통합': 0.008658008658008658,
 '주문': 0.008658008658008658,
 '리스트': 0.008658008658008658,
 '멀티방': 0.008658008658008658,
 '인기': 0.008658008658008658,
 '브라질리언': 0.008658008658008658,
 '비타민': 0.008658008658008658,
 '냉면': 0.008658008658008658,
 '대중교통': 0.008658008658008658,
 '투어': 0.008658008658008658,
 '과정': 0.008658008658008658,
 '셀프세차': 0.008658008658008658,
 '다이너': 0.008658008658008658,
 '본스치킨': 0.008658008658008658,
 '지역': 0.008658008658008658,
 '도수': 0.008658008658008658,
 '날씨': 0.008658008658008658,
 '온라인': 0.008658008658008658,
 '쇼핑리스트': 0.008658008658008658,
 '종합영양제': 0.008658008658008658,
 '입국시': 0.008658008658008658,
 '제모': 0.008658008658008658,
 '주방': 0.008658008658008658,
 '편의점': 0.008658008658008658,
 '치킨': 0.008658008658008658,
 '켄터키': 0.008658008658008658,
 '타겟': 0.008658008658008658,
 '티백차': 0.008658008658008658,
 '현지': 0.008658008658008658,
 '사와야할': 0.008658008658008658,
 '미국갈때': 0.008658008658008658,
 '미국마트에서': 0.008658008658008658,
 '싼물건': 0.008658008658008658,
 '한국물건': 0.008658008658008658,
 '사면': 0.008658008658008658,
 '구매': 0.008658008658008658,
 '상담': 0.008658008658008658,
 '미국월마트': 0.008658008658008658,
 '이미지': 0.008658008658008658,
 '미국코스트코': 0.008658008658008658,
 '상체비만': 0.008658008658008658,
 '바나나': 0.008658008658008658,
 '매장': 0.008658008658008658,
 '유명': 0.008658008658008658,
 '바비': 0.008658008658008658,
 '선크림': 0.008658008658008658,
 '쿨톤': 0.008658008658008658,
 '판매': 0.008658008658008658,
 '유래': 0.008658008658008658,
 '조인성': 0.008658008658008658,
 '청바지': 0.008658008658008658,
 '보령': 0.008658008658008658,
 '스페인': 0.008658008658008658,
 '종합감기약': 0.008658008658008658,
 '북촌': 0.008658008658008658,
 '왁싱모양': 0.008658008658008658,
 '까페': 0.008658008658008658,
 '끓이기': 0.008658008658008658,
 '일식': 0.008658008658008658,
 '블루보틀': 0.008658008658008658,
 '곰탕': 0.008658008658008658,
 '발달': 0.008658008658008658,
 '카라': 0.008658008658008658,
 '니트베스트': 0.008658008658008658,
 '혼밥집': 0.008658008658008658,
 '새싹귀리': 0.008658008658008658,
 '그레이시': 0.008658008658008658,
 '관광': 0.008658008658008658,
 '하프문베이': 0.008658008658008658,
 '수영': 0.008658008658008658,
 '탐폰': 0.008658008658008658,
 '액상철분': 0.008658008658008658,
 '소화': 0.008658008658008658,
 '도구': 0.008658008658008658,
 '샴페인어바나': 0.008658008658008658,
 '해외입국자': 0.008658008658008658,
 '모시고': 0.008658008658008658,
 '셀프하우스': 0.008658008658008658,
 '수플레': 0.008658008658008658,
 '치즈케이크': 0.008658008658008658,
 '스내플': 0.008658008658008658,
 '스키니진': 0.008658008658008658,
 '스타트업': 0.008658008658008658,
 '에리스톨': 0.008658008658008658,
 '스트라스부르': 0.008658008658008658,
 '오헤어공항': 0.008658008658008658,
 '클럽': 0.008658008658008658,
 '청와대': 0.008658008658008658,
 '시카고에서': 0.008658008658008658,
 '시카고오헤어공항': 0.008658008658008658,
 '신천': 0.008658008658008658,
 '청국장': 0.008658008658008658,
 '아기': 0.008658008658008658,
 '아메리칸': 0.008658008658008658,
 '항공': 0.008658008658008658,
 '아몬드': 0.008658008658008658,
 '활용': 0.008658008658008658,
 'tistory': 0.008658008658008658,
 '알리익스프레스': 0.008658008658008658,
 '애드빌': 0.008658008658008658,
 '요도염': 0.008658008658008658,
 '당뇨': 0.008658008658008658,
 '장식': 0.008658008658008658,
 '낱개': 0.008658008658008658,
 '여름': 0.008658008658008658,
 '홀터넥': 0.008658008658008658,
 '맨투맨': 0.008658008658008658,
 '어께': 0.008658008658008658,
 '에리스리톨스테비아': 0.008658008658008658,
 '싱글배럴': 0.008658008658008658,
 '에반윌리엄스': 0.008658008658008658,
 '에센소': 0.008658008658008658,
 '에어프라이': 0.008658008658008658,
 '민트': 0.008658008658008658,
 '초코': 0.008658008658008658,
 '컬러': 0.008658008658008658,
 '넓어보이는': 0.008658008658008658,
 '부침': 0.008658008658008658,
 '차종류': 0.008658008658008658,
 'pass': 0.008658008658008658,
 '쿠폰': 0.008658008658008658,
 '갈때': 0.008658008658008658,
 '사가면': 0.008658008658008658,
 '이데아': 0.008658008658008658,
 '이마트': 0.008658008658008658,
 '타피오카전분': 0.008658008658008658,
 '퍼스널컬러': 0.008658008658008658,
 '버터': 0.008658008658008658,
 '시절': 0.008658008658008658,
 '핑클': 0.008658008658008658,
 '카레만들기': 0.008658008658008658,
 '살까': 0.008658008658008658,
 '팟으로': 0.008658008658008658,
 '김치찌개': 0.008658008658008658,
 '떡만두국': 0.008658008658008658,
 '김치': 0.008658008658008658,
 '인앤아웃버거': 0.008658008658008658,
 '밥하기': 0.008658008658008658,
 '미역': 0.008658008658008658,
 '관련': 0.008658008658008658,
 '키오스크': 0.008658008658008658,
 '노트북': 0.008658008658008658,
 '저렴한': 0.008658008658008658,
 '홍콩반점0410': 0.008658008658008658,
 '잠실학원사거리': 0.008658008658008658,
 '유명템': 0.008658008658008658,
 '자켓': 0.008658008658008658,
 '한옥마을': 0.008658008658008658,
 '종로': 0.008658008658008658,
 '리뷰': 0.008658008658008658,
 '디스플레이': 0.008658008658008658,
 '차전자피가루': 0.008658008658008658,
 '티라미슈': 0.008658008658008658,
 '케이크': 0.008658008658008658,
 '리퍼블릭': 0.008658008658008658,
 '팔로알토': 0.008658008658008658,
 '해외': 0.008658008658008658,
 '퀴노아': 0.008658008658008658,
 '키웨스트': 0.008658008658008658,
 '키토식빵': 0.008658008658008658,
 '타피오카식빵': 0.008658008658008658,
 '탈리시': 0.008658008658008658,
 '조스': 0.008658008658008658,
 '사태': 0.008658008658008658,
 '티트리': 0.008658008658008658,
 '파이브가이즈': 0.008658008658008658,
 '판다익스프레스': 0.008658008658008658,
 '퍼스널': 0.008658008658008658,
 '플렉스': 0.008658008658008658,
 '플렉스컵': 0.008658008658008658,
 '관련문의처': 0.008658008658008658,
 '인기있는': 0.008658008658008658,
 '사야하는': 0.008658008658008658,
 '홀푸드마켓': 0.008658008658008658,
 'admirals club': 0.008658008658008658,
 'carne asada': 0.008658008658008658,
 'curtis orchard': 0.008658008658008658,
 'five guys': 0.008658008658008658,
 'nyquil 통관': 0.008658008658008658,
 'ph course': 0.008658008658008658,
 '건포도 명상': 0.008658008658008658,
 '곤약면 잡채': 0.008658008658008658,
 '어디서 빼야': 0.008658008658008658,
 '유로파 파크': 0.008658008658008658,
 '먹기 명상': 0.008658008658008658,
 '비즈니스 업그레이드': 0.008658008658008658,
 '의류 온라인': 0.008658008658008658,
 '반자 병아리콩': 0.008658008658008658,
 '산타나 로우': 0.008658008658008658,
 '선릉 이미지호': 0.008658008658008658,
 '수집 오류': 0.008658008658008658,
 '오류 페이지': 0.008658008658008658,
 '페이지 대표': 0.008658008658008658,
 '대표 목록': 0.008658008658008658,
 '윌리엄스 블랙': 0.008658008658008658,
 '콩그레스 플라자': 0.008658008658008658,
 '파이브 가이즈': 0.008658008658008658,
 'Keyword': 0.004329004329004329,
 '1g': 0.004329004329004329,
 '1살': 0.004329004329004329,
 '클래스': 0.004329004329004329,
 '1세대': 0.004329004329004329,
 '방진마스크': 0.004329004329004329,
 '마스크': 0.004329004329004329,
 '5시간자도피곤하지않은영양제': 0.004329004329004329,
 '17차': 0.004329004329004329,
 '650달러한국': 0.004329004329004329,
 'Oregon': 0.004329004329004329,
 'unit3': 0.004329004329004329,
 'urbana': 0.004329004329004329,
 'il': 0.004329004329004329,
 '1500원커피': 0.004329004329004329,
 '2021대한항공시카고왕복비행기값': 0.004329004329004329,
 'provided': 0.004329004329004329,
 'club': 0.004329004329004329,
 'adobe': 0.004329004329004329,
 'premiere': 0.004329004329004329,
 '한국말': 0.004329004329004329,
 'advil싸게구입': 0.004329004329004329,
 'airborne': 0.004329004329004329,
 'airborne발포비타민': 0.004329004329004329,
 'apple사': 0.004329004329004329,
 '목록': 0.004329004329004329,
 '유산균': 0.004329004329004329,
 '코코넛워터': 0.004329004329004329,
 'bai블루베리': 0.004329004329004329,
 'barilla': 0.004329004329004329,
 '병아리콩파스타': 0.004329004329004329,
 '기프티콘': 0.004329004329004329,
 '기프트콘': 0.004329004329004329,
 '창업': 0.004329004329004329,
 'rub': 0.004329004329004329,
 'BOTTOM': 0.004329004329004329,
 'ROUND': 0.004329004329004329,
 '영양성분': 0.004329004329004329,
 'Bigelow': 0.004329004329004329,
 '디카페인': 0.004329004329004329,
 '녹차': 0.004329004329004329,
 'champaign': 0.004329004329004329,
 'chick': 0.004329004329004329,
 'fil': 0.004329004329004329,
 'boneless': 0.004329004329004329,
 'cj': 0.004329004329004329,
 '삼계죽': 0.004329004329004329,
 'Hotel': 0.004329004329004329,
 'barrel': 0.004329004329004329,
 '끼리크림치즈': 0.004329004329004329,
 '홈런볼': 0.004329004329004329,
 'dansk': 0.004329004329004329,
 'menstrual': 0.004329004329004329,
 'cup': 0.004329004329004329,
 'europa': 0.004329004329004329,
 'park': 0.004329004329004329,
 'BOURBON': 0.004329004329004329,
 'single': 0.004329004329004329,
 'everthing': 0.004329004329004329,
 'everything': 0.004329004329004329,
 '유통기한': 0.004329004329004329,
 'ezekiel': 0.004329004329004329,
 'flourish': 0.004329004329004329,
 '시각화': 0.004329004329004329,
 'G7에스프레소': 0.004329004329004329,
 'g7커피맛': 0.004329004329004329,
 '허리색': 0.004329004329004329,
 '후드집업': 0.004329004329004329,
 'Gapfactory': 0.004329004329004329,
 'Crewneck': 0.004329004329004329,
 'Sweater': 0.004329004329004329,
 'gap직구': 0.004329004329004329,
 '여자옷': 0.004329004329004329,
 'hansa': 0.004329004329004329,
 'roasters': 0.004329004329004329,
 'H라인': 0.004329004329004329,
 '원피스가': 0.004329004329004329,
 '날씬해보이나요': 0.004329004329004329,
 'gutsy': 0.004329004329004329,
 'illinois': 0.004329004329004329,
 'cocomero': 0.004329004329004329,
 'IRSH': 0.004329004329004329,
 '삼각티백': 0.004329004329004329,
 'jj': 0.004329004329004329,
 "jj'shouse": 0.004329004329004329,
 'keywest': 0.004329004329004329,
 '방진': 0.004329004329004329,
 'Medium': 0.004329004329004329,
 'Roast': 0.004329004329004329,
 'out버거': 0.004329004329004329,
 '산타모니카': 0.004329004329004329,
 '주차장': 0.004329004329004329,
 '주차하기': 0.004329004329004329,
 '인앤아웃햄버거': 0.004329004329004329,
 '조식': 0.004329004329004329,
 '호텔추천': 0.004329004329004329,
 'la공항에서': 0.004329004329004329,
 'LA롱혼': 0.004329004329004329,
 'La산타모니카해변': 0.004329004329004329,
 'la왕복비행기값': 0.004329004329004329,
 'la요시노야': 0.004329004329004329,
 'likeher': 0.004329004329004329,
 'lollia': 0.004329004329004329,
 '핸드크림': 0.004329004329004329,
 'longhorn': 0.004329004329004329,
 'lulus': 0.004329004329004329,
 'lulus의상니트': 0.004329004329004329,
 'McKinley': 0.004329004329004329,
 'health': 0.004329004329004329,
 '예방접종': 0.004329004329004329,
 'mct': 0.004329004329004329,
 '오일': 0.004329004329004329,
 'Merry': 0.004329004329004329,
 'Ann': 0.004329004329004329,
 'Diner': 0.004329004329004329,
 'milpitas': 0.004329004329004329,
 'restaurant': 0.004329004329004329,
 'mindfulness': 0.004329004329004329,
 '앨런': 0.004329004329004329,
 'oscillococcinum': 0.004329004329004329,
 'smokehouse': 0.004329004329004329,
 'smoke': 0.004329004329004329,
 'peru': 0.004329004329004329,
 'select': 0.004329004329004329,
 'pest': 0.004329004329004329,
 'control': 0.004329004329004329,
 'pop': 0.004329004329004329,
 'kernels': 0.004329004329004329,
 'CORN': 0.004329004329004329,
 '얼그레이': 0.004329004329004329,
 'rishi아메리칸': 0.004329004329004329,
 '블랙퍼스트': 0.004329004329004329,
 '배송': 0.004329004329004329,
 'sauage': 0.004329004329004329,
 '같은': 0.004329004329004329,
 '야상': 0.004329004329004329,
 '익스프레스': 0.004329004329004329,
 '재고': 0.004329004329004329,
 '수납': 0.004329004329004329,
 'shein은': 0.004329004329004329,
 '어디거': 0.004329004329004329,
 '감미료': 0.004329004329004329,
 'stevia': 0.004329004329004329,
 '용도': 0.004329004329004329,
 'sunnyvale': 0.004329004329004329,
 'talisi': 0.004329004329004329,
 'teavana': 0.004329004329004329,
 'chai': 0.004329004329004329,
 'teavana케모마일': 0.004329004329004329,
 'terra': 0.004329004329004329,
 'Top': 0.004329004329004329,
 'sirloin': 0.004329004329004329,
 'facial': 0.004329004329004329,
 'serum': 0.004329004329004329,
 '건조과자': 0.004329004329004329,
 'truvia': 0.004329004329004329,
 'udacity': 0.004329004329004329,
 '할인': 0.004329004329004329,
 'vermon': 0.004329004329004329,
 'hills': 0.004329004329004329,
 '2시간': 0.004329004329004329,
 '딱딱소리': 0.004329004329004329,
 'WB21': 0.004329004329004329,
 '실속': 0.004329004329004329,
 '세트': 0.004329004329004329,
 'wei': 0.004329004329004329,
 'chuan': 0.004329004329004329,
 'whole': 0.004329004329004329,
 'Erythritol': 0.004329004329004329,
 '몽크': 0.004329004329004329,
 'earth몽크프룻': 0.004329004329004329,
 'shein같은': 0.004329004329004329,
 'shein추천': 0.004329004329004329,
 '반품': 0.004329004329004329,
 'zlxhwpslrtodzmfla': 0.004329004329004329,
 'zzzquil물약': 0.004329004329004329,
 '가디언': 0.004329004329004329,
 '오브': 0.004329004329004329,
 '포레스트': 0.004329004329004329,
 '한우직판장': 0.004329004329004329,
 '날씬해보이는': 0.004329004329004329,
 '가장': 0.004329004329004329,
 '테마파크': 0.004329004329004329,
 '간단': 0.004329004329004329,
 '아히요': 0.004329004329004329,
 '럭시피': 0.004329004329004329,
 '케토': 0.004329004329004329,
 '감정감각': 0.004329004329004329,
 '보슬보슬': 0.004329004329004329,
 '뼈다귀': 0.004329004329004329,
 '평점': 0.004329004329004329,
 '뼈해장국': 0.004329004329004329,
 '쌈주머니': 0.004329004329004329,
 '신세계': 0.004329004329004329,
 '강남역60대맛집음식점': 0.004329004329004329,
 '강남역dvd방': 0.004329004329004329,
 '강동구해외입국자가격리자지원': 0.004329004329004329,
 '보다': 0.004329004329004329,
 '55사이즈': 0.004329004329004329,
 '입술넥': 0.004329004329004329,
 '갭원피스': 0.004329004329004329,
 '조언': 0.004329004329004329,
 '과육': 0.004329004329004329,
 '심을경우': 0.004329004329004329,
 '연습': 0.004329004329004329,
 '건포도명상': 0.004329004329004329,
 '걷기명상': 0.004329004329004329,
 '건포도명상하는이유': 0.004329004329004329,
 '걷기우울증': 0.004329004329004329,
 '걸어다니는': 0.004329004329004329,
 '싸게사기': 0.004329004329004329,
 '싸게파는곳': 0.004329004329004329,
 '격리문의처': 0.004329004329004329,
 '경기도전화지역국': 0.004329004329004329,
 '한우': 0.004329004329004329,
 '한우직판': 0.004329004329004329,
 '계란김밥파는곳': 0.004329004329004329,
 '고구마랑': 0.004329004329004329,
 '어울리는쥬스': 0.004329004329004329,
 '고기없는된장찌개법': 0.004329004329004329,
 '고기없이': 0.004329004329004329,
 '고등학생': 0.004329004329004329,
 '곤약국수요리': 0.004329004329004329,
 '곤약면': 0.004329004329004329,
 '곤약면잡채': 0.004329004329004329,
 '공항라운지샌드위치': 0.004329004329004329,
 '과월경': 0.004329004329004329,
 '군마트': 0.004329004329004329,
 '귀리': 0.004329004329004329,
 '귀리차복용법': 0.004329004329004329,
 '들린': 0.004329004329004329,
 '그림명상수업': 0.004329004329004329,
 '글루': 0.004329004329004329,
 '5초': 0.004329004329004329,
 '기립성저혈압': 0.004329004329004329,
 '깔라만시와라임': 0.004329004329004329,
 '아이디어': 0.004329004329004329,
 '웨딩홀': 0.004329004329004329,
 '꽃나무야생화': 0.004329004329004329,
 '꽃사진': 0.004329004329004329,
 '제목': 0.004329004329004329,
 '끌레도르': 0.004329004329004329,
 '끝맛이': 0.004329004329004329,
 '깔끔한': 0.004329004329004329,
 '롤치즈케익': 0.004329004329004329,
 '모찌롤': 0.004329004329004329,
 '나우푸드': 0.004329004329004329,
 ...}
In [48]:
plt.figure(figsize=(12,12))
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.show()

2. Finding English Keyword Topics

In [133]:
# Finding English Keywords
# Defining Data
list_eng = []

#Delete Special cases
def cleanText(readData):
    text = re.sub('[-=+,#/\?:^$.@*\"※~&%ㆍ!』\\‘|\(\)\[\]\<\>`\'…》]', ' ', readData)
    return text

for i in df['Keyword']:
    # Deleting Korean words
    text = re.sub('[^a-zA-Z]',' ',i).strip()
    #Delete Special cases
    text = cleanText(text)
    #Capitalize all words 
    text = text.upper()
    # Deleting empty rows
    if(text != ''):
         list_eng.append(text)  
         
print(list_eng)
['G', 'M', 'M', 'M       KF', 'M', 'W OREGON UNIT  URBANA IL', 'NOT PROVIDED', 'ADMIRALS CLUB', 'ADOBE PREMIERE', 'ADVIL', 'AIRBORNE', 'AIRBORNE', 'APPLE', 'ASOS', 'ASOS', 'ASOS', 'AVEC', 'AZO', 'AZO', 'BAGEL SESAME', 'BAGEL SESAME SEASONING BLEND', 'BAI', 'BAI', 'BAI', 'BAI', 'BANZA', 'BANZA', 'BANZA', 'BARILLA', 'BBQ', 'BBQ RUB', 'BEEF BOTTOM ROUND STEAK', 'BEEF CHUCK RIB', 'BEEF CHUCK STEAK', 'BIGELOW', 'CALTRAIN', 'CALTRAIN', 'CHAMPAIGN', 'CHICK FIL A', 'CHUCK STEAK BONELESS CARNE ASADA', 'CJ', 'CONGRESS PLAZA HOTEL', 'CONGRESS', 'CRATE BARREL', 'CRATE BARREL', 'CRATE BARREL', 'CU', 'CU', 'CURTIS ORCHARD', 'CURTIS ORCHARD', 'DANSK', 'DIVACUP', 'DIVACUP MENSTRUAL CUP', 'EUROPA PARK', 'EVAN WILLIAMS', 'EVAN WILLIAMS        BOURBON', 'EVAN WILLIAMS SINGLE BARREL', 'EVERTHING BUT THE BAGEL SESAME', 'EVERYTHING BUT THE BAGEL SEASONING', 'EVERYTHING BUT THE BAGEL SESAME SEASONING BLEND', 'EVERYTHING THE BAGEL', 'EXTRA', 'EXTRA', 'EYEBUYDIRECT', 'EYEBUYDIRECT', 'EYEBUYDIRECT', 'EYEBUYDIRECT', 'EYEBUYDIRECT', 'EZEKIEL', 'FIVE GUYS LA', 'FLEX CUP', 'FLEX CUP', 'FLEX', 'FLEX', 'FLEXCUP', 'FLEXCUP', 'FLEX', 'FLEX', 'FLOURISH', 'G', 'G', 'GAP', 'GAP', 'GAP      S', 'GAPFACTORY CREWNECK SWEATER', 'GAP', 'HANSA COFFEE ROASTERS', 'H', 'I AM GUTSY', 'ILLINOIS COCOMERO', 'IRSH', 'JJ HOUSE', 'JJ HOUSE', 'JJ S HOUSE', 'JJ S HOUSE', 'JJ SHOUSE', 'JJ S HOUSE', 'JJS S HOUSE', 'JJSHOUSE', 'JJSHOUSE', 'KEYWEST', 'KF', 'KICKING HORSE COFFEE THREE SISTERS', 'KICKING HORSE COFFEE THREE SISTERS', 'KICKING HORSE COFFEE  THREE SISTERS  MEDIUM ROAST', 'KICKING HORSE THREE SISTERS', 'LA FIVE GUYS', 'LA    IN AND OUT', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LIKEHER', 'LOLLIA', 'LONGHORN', 'LULUS', 'LULUS', 'MCKINLEY HEALTH', 'MCT', 'MERRY ANN S DINER', 'MILPITAS RESTAURANT', 'MINDFULNESS', 'MONK FRUIT', 'NYQUIL', 'NYQUIL', 'OSCILLOCOCCINUM', 'PAPPY S HOUSE', 'PAPPY S SMOKEHOUSE', 'PAPPY S SMOKE', 'PERU SELECT COFFEE', 'PEST CONTROL', 'PH D COURSE', 'PH D  COURSE', 'POP KERNELS KETTLE CORN CHIP', 'PYURE', 'PYURE', 'RISHI', 'RISHI', 'ROMWE', 'ROMWE', 'ROMWE', 'ROMWE', 'ROMWE', 'ROMWE', 'RUB', 'SAUAGE', 'SEASONING BLEND', 'SHEIN ZAFUL', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SPLENDA KETO', 'SPLENDA MONK FRUIT', 'SPLENDA NATURALS', 'SPLENDA NATURALS', 'SPLENDA STEVIA', 'SPLENDA', 'SPLENDA', 'SPLENDA', 'SUNNYVALE', 'TALISI', 'TEAVANA CHAI TEA', 'TEAVANA', 'TERRA CHIP', 'TOP SIRLOIN', 'TRADER JOE S', 'TRADER JOE S SEASONING', 'TRADER JOE S TEA', 'TRADER JOE S', 'TRADER JOE S', 'TRADER JOE S ANTIOXIDANT', 'TRADER JOE S ANTIOXIDANT FACIAL SERUM', 'TRADER JOE S BAGEL SESAME', 'TRADER JOE S BBQ SEASONING', 'TRADER JOE S', 'TRADER JOE S', 'TRADER JOE S', 'TRADER JOE S', 'TRADER JOE S TEA', 'TRADER JOES BBQ RUB', 'TRADER JOES', 'TRADER S JOE', 'TRUVIA', 'UDACITY', 'VERMON HILLS', 'VREW', 'VREW', 'WB', 'WEI CHUAN', 'WHOLE EARTH ERYTHRITOL', 'WHOLE EARTH', 'WHOLE EARTH', 'WHOLE EARTH', 'WHOLE EARTH', 'WHOLE EARTH', 'WHOLE EARTH', 'ZAFUL SHEIN', 'ZAFUL SHEIN', 'ZAFUL', 'ZLXHWPSLRTODZMFLA', 'ZZZQUIL', 'ZZZQUIL', 'ZZZQUIL', 'ZZZQUIL', 'ZZZQUIL', 'ZZZQUIL', 'DVD', 'TRADER JOE', 'G', 'LA', 'LA', 'PRIORITY CLUB', 'IN N OUT BURGER', 'LA', 'LA', 'LA', 'KETTLE CHIP', 'LA', 'PH D', 'TEA', 'TEA', 'TEA', 'PH D', 'CARNE ASADA', 'KETO', 'PB', 'SHOULDER', 'RIB', 'TEA', 'ALDI', 'G', 'JANG TU', 'CRATE BARREL', 'DVD', 'PHILZ COFFEE', 'DAEHO', 'FLEX', 'FLEX CUP', 'FLEX CUP', 'FLEXCUP', 'CROSS RIB', 'G', 'RD', 'SPLENDA', 'VIA', 'ADMIRALS CLUB', 'AVEC', 'CONGRESS', 'CONGRESS PLAZA', 'CONGRESS', 'LA', 'LOCKER', 'AZO', 'TISTORY COM', 'TISTORY COM', 'G', 'RISHI', 'TAZO TEA', 'TEA', 'YOGI', 'PB', 'TEA', 'FAVICON', 'VUE', 'ONE TIME PASS', 'LA', 'ONETIME PASS', 'KG', 'LA', 'VIA', 'QT', 'M', 'VIA', 'KETO', 'MONK FRUIT', 'TEA', 'AZO', 'EVERYTHING BAGEL', 'BAGEL', 'BBQ RUB', 'EVERYTHING BUT THE ELOTE', 'LA', 'SHAVED BEEF', 'TEA TREE OIL', 'ALDI', 'RISHI']
In [119]:
 # dictionary of lists  
dict = {'name': list_eng}  
       
df = pd.DataFrame(dict) 
    
# saving the dataframe 
df.to_csv('eng_keywords.csv') 
In [121]:
corpus = pd.DataFrame(list_eng)
text = " ".join(corpus[0].tolist())
text
Out[121]:
'G M M M       KF M W OREGON UNIT  URBANA IL NOT PROVIDED ADMIRALS CLUB ADOBE PREMIERE ADVIL AIRBORNE AIRBORNE APPLE ASOS ASOS ASOS AVEC AZO AZO BAGEL SESAME BAGEL SESAME SEASONING BLEND BAI BAI BAI BAI BANZA BANZA BANZA BARILLA BBQ BBQ RUB BEEF BOTTOM ROUND STEAK BEEF CHUCK RIB BEEF CHUCK STEAK BIGELOW CALTRAIN CALTRAIN CHAMPAIGN CHICK FIL A CHUCK STEAK BONELESS CARNE ASADA CJ CONGRESS PLAZA HOTEL CONGRESS CRATE BARREL CRATE BARREL CRATE BARREL CU CU CURTIS ORCHARD CURTIS ORCHARD DANSK DIVACUP DIVACUP MENSTRUAL CUP EUROPA PARK EVAN WILLIAMS EVAN WILLIAMS        BOURBON EVAN WILLIAMS SINGLE BARREL EVERTHING BUT THE BAGEL SESAME EVERYTHING BUT THE BAGEL SEASONING EVERYTHING BUT THE BAGEL SESAME SEASONING BLEND EVERYTHING THE BAGEL EXTRA EXTRA EYEBUYDIRECT EYEBUYDIRECT EYEBUYDIRECT EYEBUYDIRECT EYEBUYDIRECT EZEKIEL FIVE GUYS LA FLEX CUP FLEX CUP FLEX FLEX FLEXCUP FLEXCUP FLEX FLEX FLOURISH G G GAP GAP GAP      S GAPFACTORY CREWNECK SWEATER GAP HANSA COFFEE ROASTERS H I AM GUTSY ILLINOIS COCOMERO IRSH JJ HOUSE JJ HOUSE JJ S HOUSE JJ S HOUSE JJ SHOUSE JJ S HOUSE JJS S HOUSE JJSHOUSE JJSHOUSE KEYWEST KF KICKING HORSE COFFEE THREE SISTERS KICKING HORSE COFFEE THREE SISTERS KICKING HORSE COFFEE  THREE SISTERS  MEDIUM ROAST KICKING HORSE THREE SISTERS LA FIVE GUYS LA    IN AND OUT LA LA LA LA LA LA LA LA LA LA LA LA LA LA LA LA LA LA LA LA LA LIKEHER LOLLIA LONGHORN LULUS LULUS MCKINLEY HEALTH MCT MERRY ANN S DINER MILPITAS RESTAURANT MINDFULNESS MONK FRUIT NYQUIL NYQUIL OSCILLOCOCCINUM PAPPY S HOUSE PAPPY S SMOKEHOUSE PAPPY S SMOKE PERU SELECT COFFEE PEST CONTROL PH D COURSE PH D  COURSE POP KERNELS KETTLE CORN CHIP PYURE PYURE RISHI RISHI ROMWE ROMWE ROMWE ROMWE ROMWE ROMWE RUB SAUAGE SEASONING BLEND SHEIN ZAFUL SHEIN SHEIN SHEIN SHEIN SHEIN SHEIN SHEIN SHEIN SHEIN SHEIN SHEIN SHEIN SHEIN SPLENDA KETO SPLENDA MONK FRUIT SPLENDA NATURALS SPLENDA NATURALS SPLENDA STEVIA SPLENDA SPLENDA SPLENDA SUNNYVALE TALISI TEAVANA CHAI TEA TEAVANA TERRA CHIP TOP SIRLOIN TRADER JOE S TRADER JOE S SEASONING TRADER JOE S TEA TRADER JOE S TRADER JOE S TRADER JOE S ANTIOXIDANT TRADER JOE S ANTIOXIDANT FACIAL SERUM TRADER JOE S BAGEL SESAME TRADER JOE S BBQ SEASONING TRADER JOE S TRADER JOE S TRADER JOE S TRADER JOE S TRADER JOE S TEA TRADER JOES BBQ RUB TRADER JOES TRADER S JOE TRUVIA UDACITY VERMON HILLS VREW VREW WB WEI CHUAN WHOLE EARTH ERYTHRITOL WHOLE EARTH WHOLE EARTH WHOLE EARTH WHOLE EARTH WHOLE EARTH WHOLE EARTH ZAFUL SHEIN ZAFUL SHEIN ZAFUL ZLXHWPSLRTODZMFLA ZZZQUIL ZZZQUIL ZZZQUIL ZZZQUIL ZZZQUIL ZZZQUIL DVD TRADER JOE G LA LA PRIORITY CLUB IN N OUT BURGER LA LA LA KETTLE CHIP LA PH D TEA TEA TEA PH D CARNE ASADA KETO PB SHOULDER RIB TEA ALDI G JANG TU CRATE BARREL DVD PHILZ COFFEE DAEHO FLEX FLEX CUP FLEX CUP FLEXCUP CROSS RIB G RD SPLENDA VIA ADMIRALS CLUB AVEC CONGRESS CONGRESS PLAZA CONGRESS LA LOCKER AZO TISTORY COM TISTORY COM G RISHI TAZO TEA TEA YOGI PB TEA FAVICON VUE ONE TIME PASS LA ONETIME PASS KG LA VIA QT M VIA KETO MONK FRUIT TEA AZO EVERYTHING BAGEL BAGEL BBQ RUB EVERYTHING BUT THE ELOTE LA SHAVED BEEF TEA TREE OIL ALDI RISHI'
In [77]:
from nltk.tokenize import word_tokenize
tokenized_word=word_tokenize(text)
print(tokenized_word)
['G', 'M', 'M', 'M', 'KF', 'M', 'W', 'OREGON', 'UNIT', 'URBANA', 'IL', 'NOT', 'PROVIDED', 'ADMIRALS', 'CLUB', 'ADOBE', 'PREMIERE', 'ADVIL', 'AIRBORNE', 'AIRBORNE', 'APPLE', 'ASOS', 'ASOS', 'ASOS', 'AVEC', 'AZO', 'AZO', 'BAGEL', 'SESAME', 'BAGEL', 'SESAME', 'SEASONING', 'BLEND', 'BAI', 'BAI', 'BAI', 'BAI', 'BANZA', 'BANZA', 'BANZA', 'BARILLA', 'BBQ', 'BBQ', 'RUB', 'BEEF', 'BOTTOM', 'ROUND', 'STEAK', 'BEEF', 'CHUCK', 'RIB', 'BEEF', 'CHUCK', 'STEAK', 'BIGELOW', 'CALTRAIN', 'CALTRAIN', 'CHAMPAIGN', 'CHICK', 'FIL', 'A', 'CHUCK', 'STEAK', 'BONELESS', 'CARNE', 'ASADA', 'CJ', 'CONGRESS', 'PLAZA', 'HOTEL', 'CONGRESS', 'CRATE', 'BARREL', 'CRATE', 'BARREL', 'CRATE', 'BARREL', 'CU', 'CU', 'CURTIS', 'ORCHARD', 'CURTIS', 'ORCHARD', 'DANSK', 'DIVACUP', 'DIVACUP', 'MENSTRUAL', 'CUP', 'EUROPA', 'PARK', 'EVAN', 'WILLIAMS', 'EVAN', 'WILLIAMS', 'BOURBON', 'EVAN', 'WILLIAMS', 'SINGLE', 'BARREL', 'EVERTHING', 'BUT', 'THE', 'BAGEL', 'SESAME', 'EVERYTHING', 'BUT', 'THE', 'BAGEL', 'SEASONING', 'EVERYTHING', 'BUT', 'THE', 'BAGEL', 'SESAME', 'SEASONING', 'BLEND', 'EVERYTHING', 'THE', 'BAGEL', 'EXTRA', 'EXTRA', 'EYEBUYDIRECT', 'EYEBUYDIRECT', 'EYEBUYDIRECT', 'EYEBUYDIRECT', 'EYEBUYDIRECT', 'EZEKIEL', 'FIVE', 'GUYS', 'LA', 'FLEX', 'CUP', 'FLEX', 'CUP', 'FLEX', 'FLEX', 'FLEXCUP', 'FLEXCUP', 'FLEX', 'FLEX', 'FLOURISH', 'G', 'G', 'GAP', 'GAP', 'GAP', 'S', 'GAPFACTORY', 'CREWNECK', 'SWEATER', 'GAP', 'HANSA', 'COFFEE', 'ROASTERS', 'H', 'I', 'AM', 'GUTSY', 'ILLINOIS', 'COCOMERO', 'IRSH', 'JJ', 'HOUSE', 'JJ', 'HOUSE', 'JJ', 'S', 'HOUSE', 'JJ', 'S', 'HOUSE', 'JJ', 'SHOUSE', 'JJ', 'S', 'HOUSE', 'JJS', 'S', 'HOUSE', 'JJSHOUSE', 'JJSHOUSE', 'KEYWEST', 'KF', 'KICKING', 'HORSE', 'COFFEE', 'THREE', 'SISTERS', 'KICKING', 'HORSE', 'COFFEE', 'THREE', 'SISTERS', 'KICKING', 'HORSE', 'COFFEE', 'THREE', 'SISTERS', 'MEDIUM', 'ROAST', 'KICKING', 'HORSE', 'THREE', 'SISTERS', 'LA', 'FIVE', 'GUYS', 'LA', 'IN', 'AND', 'OUT', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LA', 'LIKEHER', 'LOLLIA', 'LONGHORN', 'LULUS', 'LULUS', 'MCKINLEY', 'HEALTH', 'MCT', 'MERRY', 'ANN', 'S', 'DINER', 'MILPITAS', 'RESTAURANT', 'MINDFULNESS', 'MONK', 'FRUIT', 'NYQUIL', 'NYQUIL', 'OSCILLOCOCCINUM', 'PAPPY', 'S', 'HOUSE', 'PAPPY', 'S', 'SMOKEHOUSE', 'PAPPY', 'S', 'SMOKE', 'PERU', 'SELECT', 'COFFEE', 'PEST', 'CONTROL', 'PH', 'D', 'COURSE', 'PH', 'D', 'COURSE', 'POP', 'KERNELS', 'KETTLE', 'CORN', 'CHIP', 'PYURE', 'PYURE', 'RISHI', 'RISHI', 'ROMWE', 'ROMWE', 'ROMWE', 'ROMWE', 'ROMWE', 'ROMWE', 'RUB', 'SAUAGE', 'SEASONING', 'BLEND', 'SHEIN', 'ZAFUL', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SHEIN', 'SPLENDA', 'KETO', 'SPLENDA', 'MONK', 'FRUIT', 'SPLENDA', 'NATURALS', 'SPLENDA', 'NATURALS', 'SPLENDA', 'STEVIA', 'SPLENDA', 'SPLENDA', 'SPLENDA', 'SUNNYVALE', 'TALISI', 'TEAVANA', 'CHAI', 'TEA', 'TEAVANA', 'TERRA', 'CHIP', 'TOP', 'SIRLOIN', 'TRADER', 'JOE', 'S', 'TRADER', 'JOE', 'S', 'SEASONING', 'TRADER', 'JOE', 'S', 'TEA', 'TRADER', 'JOE', 'S', 'TRADER', 'JOE', 'S', 'TRADER', 'JOE', 'S', 'ANTIOXIDANT', 'TRADER', 'JOE', 'S', 'ANTIOXIDANT', 'FACIAL', 'SERUM', 'TRADER', 'JOE', 'S', 'BAGEL', 'SESAME', 'TRADER', 'JOE', 'S', 'BBQ', 'SEASONING', 'TRADER', 'JOE', 'S', 'TRADER', 'JOE', 'S', 'TRADER', 'JOE', 'S', 'TRADER', 'JOE', 'S', 'TRADER', 'JOE', 'S', 'TEA', 'TRADER', 'JOES', 'BBQ', 'RUB', 'TRADER', 'JOES', 'TRADER', 'S', 'JOE', 'TRUVIA', 'UDACITY', 'VERMON', 'HILLS', 'VREW', 'VREW', 'WB', 'WEI', 'CHUAN', 'WHOLE', 'EARTH', 'ERYTHRITOL', 'WHOLE', 'EARTH', 'WHOLE', 'EARTH', 'WHOLE', 'EARTH', 'WHOLE', 'EARTH', 'WHOLE', 'EARTH', 'WHOLE', 'EARTH', 'ZAFUL', 'SHEIN', 'ZAFUL', 'SHEIN', 'ZAFUL', 'ZLXHWPSLRTODZMFLA', 'ZZZQUIL', 'ZZZQUIL', 'ZZZQUIL', 'ZZZQUIL', 'ZZZQUIL', 'ZZZQUIL', 'DVD', 'TRADER', 'JOE', 'G', 'LA', 'LA', 'PRIORITY', 'CLUB', 'IN', 'N', 'OUT', 'BURGER', 'LA', 'LA', 'LA', 'KETTLE', 'CHIP', 'LA', 'PH', 'D', 'TEA', 'TEA', 'TEA', 'PH', 'D', 'CARNE', 'ASADA', 'KETO', 'PB', 'SHOULDER', 'RIB', 'TEA', 'ALDI', 'G', 'JANG', 'TU', 'CRATE', 'BARREL', 'DVD', 'PHILZ', 'COFFEE', 'DAEHO', 'FLEX', 'FLEX', 'CUP', 'FLEX', 'CUP', 'FLEXCUP', 'CROSS', 'RIB', 'G', 'RD', 'SPLENDA', 'VIA', 'ADMIRALS', 'CLUB', 'AVEC', 'CONGRESS', 'CONGRESS', 'PLAZA', 'CONGRESS', 'LA', 'LOCKER', 'AZO', 'TISTORY', 'COM', 'TISTORY', 'COM', 'G', 'RISHI', 'TAZO', 'TEA', 'TEA', 'YOGI', 'PB', 'TEA', 'FAVICON', 'VUE', 'ONE', 'TIME', 'PASS', 'LA', 'ONETIME', 'PASS', 'KG', 'LA', 'VIA', 'QT', 'M', 'VIA', 'KETO', 'MONK', 'FRUIT', 'TEA', 'AZO', 'EVERYTHING', 'BAGEL', 'BAGEL', 'BBQ', 'RUB', 'EVERYTHING', 'BUT', 'THE', 'ELOTE', 'LA', 'SHAVED', 'BEEF', 'TEA', 'TREE', 'OIL', 'ALDI', 'RISHI']
In [78]:
from nltk.probability import FreqDist
fdist = FreqDist(tokenized_word)
print(fdist)
<FreqDist with 206 samples and 540 outcomes>
In [80]:
fdist.most_common(10)
Out[80]:
[('LA', 34),
 ('S', 24),
 ('TRADER', 18),
 ('SHEIN', 16),
 ('JOE', 16),
 ('TEA', 12),
 ('BAGEL', 9),
 ('FLEX', 9),
 ('SPLENDA', 9),
 ('G', 7)]
In [81]:
# Frequency Distribution Plot
import matplotlib.pyplot as plt
fdist.plot(30,cumulative=False)
plt.show()
In [ ]:
#removing stopwords
from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize 

stop_words = set(stopwords.words('english')) 

word_tokens = word_tokenize(text)

result = []
for token in word_tokens: 
    if token not in stop_words: 
        result.append(token) 
text = result
In [83]:
counter = Counter(text)
counter.most_common(10)
Out[83]:
[('LA', 34),
 ('S', 24),
 ('TRADER', 18),
 ('SHEIN', 16),
 ('JOE', 16),
 ('TEA', 12),
 ('BAGEL', 9),
 ('FLEX', 9),
 ('SPLENDA', 9),
 ('G', 7)]
In [84]:
#Delete Stopwords
available_counter = Counter({x: counter[x] for x in counter if len(x) > 1})
available_counter.most_common(10)
Out[84]:
[('LA', 34),
 ('TRADER', 18),
 ('SHEIN', 16),
 ('JOE', 16),
 ('TEA', 12),
 ('BAGEL', 9),
 ('FLEX', 9),
 ('SPLENDA', 9),
 ('HOUSE', 7),
 ('WHOLE', 7)]
In [125]:
text_eng = open('eng_keywords.csv').read()
circle = np.array(Image.open('circle_1.jpeg'))

stopwords = set(STOPWORDS)
wc_e = WordCloud(collocations=False, background_color='white', max_words=2000, mask=circle, stopwords=stopwords)
wc_e = wc_e.generate(text_eng)
wc_e.words_
Out[125]:
{'ADMIRALS': 0.058823529411764705,
 'ADOBE': 0.029411764705882353,
 'ADVIL': 0.029411764705882353,
 'AIRBORNE': 0.058823529411764705,
 'ALDI': 0.058823529411764705,
 'ANN': 0.029411764705882353,
 'ANTIOXIDANT': 0.058823529411764705,
 'APPLE': 0.029411764705882353,
 'ASADA': 0.058823529411764705,
 'ASOS': 0.08823529411764706,
 'AVEC': 0.058823529411764705,
 'AZO': 0.11764705882352941,
 'BAGEL': 0.2647058823529412,
 'BAI': 0.11764705882352941,
 'BANZA': 0.08823529411764706,
 'BARILLA': 0.029411764705882353,
 'BARREL': 0.14705882352941177,
 'BBQ': 0.14705882352941177,
 'BEEF': 0.11764705882352941,
 'BIGELOW': 0.029411764705882353,
 'BLEND': 0.08823529411764706,
 'BONELESS': 0.029411764705882353,
 'BOTTOM': 0.029411764705882353,
 'BOURBON': 0.029411764705882353,
 'BURGER': 0.029411764705882353,
 'CALTRAIN': 0.058823529411764705,
 'CARNE': 0.058823529411764705,
 'CHAI': 0.029411764705882353,
 'CHAMPAIGN': 0.029411764705882353,
 'CHICK': 0.029411764705882353,
 'CHIP': 0.08823529411764706,
 'CHUAN': 0.029411764705882353,
 'CHUCK': 0.08823529411764706,
 'CJ': 0.029411764705882353,
 'CLUB': 0.08823529411764706,
 'COCOMERO': 0.029411764705882353,
 'COFFEE': 0.17647058823529413,
 'CONGRESS': 0.14705882352941177,
 'CONTROL': 0.029411764705882353,
 'CORN': 0.029411764705882353,
 'COURSE': 0.058823529411764705,
 'CRATE': 0.11764705882352941,
 'CREWNECK': 0.029411764705882353,
 'CROSS': 0.029411764705882353,
 'CU': 0.058823529411764705,
 'CUP': 0.14705882352941177,
 'CURTIS': 0.058823529411764705,
 'DAEHO': 0.029411764705882353,
 'DANSK': 0.029411764705882353,
 'DINER': 0.029411764705882353,
 'DIVACUP': 0.058823529411764705,
 'DVD': 0.058823529411764705,
 'EARTH': 0.20588235294117646,
 'ELOTE': 0.029411764705882353,
 'ERYTHRITOL': 0.029411764705882353,
 'EUROPA': 0.029411764705882353,
 'EVAN': 0.08823529411764706,
 'EVERTHING': 0.029411764705882353,
 'EVERYTHING': 0.14705882352941177,
 'EXTRA': 0.058823529411764705,
 'EYEBUYDIRECT': 0.14705882352941177,
 'EZEKIEL': 0.029411764705882353,
 'FACIAL': 0.029411764705882353,
 'FAVICON': 0.029411764705882353,
 'FIL': 0.029411764705882353,
 'FIVE': 0.058823529411764705,
 'FLEX': 0.2647058823529412,
 'FLEXCUP': 0.08823529411764706,
 'FLOURISH': 0.029411764705882353,
 'FRUIT': 0.08823529411764706,
 'GAP': 0.11764705882352941,
 'GAPFACTORY': 0.029411764705882353,
 'GUTSY': 0.029411764705882353,
 'GUYS': 0.058823529411764705,
 'HANSA': 0.029411764705882353,
 'HEALTH': 0.029411764705882353,
 'HILLS': 0.029411764705882353,
 'HORSE': 0.11764705882352941,
 'HOTEL': 0.029411764705882353,
 'HOUSE': 0.20588235294117646,
 'IL': 0.029411764705882353,
 'ILLINOIS': 0.029411764705882353,
 'IRSH': 0.029411764705882353,
 'JANG': 0.029411764705882353,
 'JJ': 0.20588235294117646,
 'JJSHOUSE': 0.058823529411764705,
 'JOE': 0.5294117647058824,
 'KERNELS': 0.029411764705882353,
 'KETO': 0.08823529411764706,
 'KETTLE': 0.058823529411764705,
 'KEYWEST': 0.029411764705882353,
 'KF': 0.058823529411764705,
 'KG': 0.029411764705882353,
 'KICKING': 0.11764705882352941,
 'LA': 1.0,
 'LIKEHER': 0.029411764705882353,
 'LOCKER': 0.029411764705882353,
 'LOLLIA': 0.029411764705882353,
 'LONGHORN': 0.029411764705882353,
 'LULUS': 0.058823529411764705,
 'MCKINLEY': 0.029411764705882353,
 'MCT': 0.029411764705882353,
 'MEDIUM': 0.029411764705882353,
 'MENSTRUAL': 0.029411764705882353,
 'MERRY': 0.029411764705882353,
 'MILPITAS': 0.029411764705882353,
 'MINDFULNESS': 0.029411764705882353,
 'MONK': 0.08823529411764706,
 'NATURALS': 0.058823529411764705,
 'NYQUIL': 0.058823529411764705,
 'OIL': 0.029411764705882353,
 'ONE': 0.029411764705882353,
 'ONETIME': 0.029411764705882353,
 'ORCHARD': 0.058823529411764705,
 'OREGON': 0.029411764705882353,
 'OSCILLOCOCCINUM': 0.029411764705882353,
 'PAPPY': 0.08823529411764706,
 'PARK': 0.029411764705882353,
 'PASS': 0.058823529411764705,
 'PB': 0.058823529411764705,
 'PERU': 0.029411764705882353,
 'PEST': 0.029411764705882353,
 'PH': 0.11764705882352941,
 'PHILZ': 0.029411764705882353,
 'PLAZA': 0.058823529411764705,
 'POP': 0.029411764705882353,
 'PREMIERE': 0.029411764705882353,
 'PRIORITY': 0.029411764705882353,
 'PROVIDED': 0.029411764705882353,
 'PYURE': 0.058823529411764705,
 'QT': 0.029411764705882353,
 'RD': 0.029411764705882353,
 'RESTAURANT': 0.029411764705882353,
 'RIB': 0.08823529411764706,
 'RISHI': 0.11764705882352941,
 'ROAST': 0.029411764705882353,
 'ROASTERS': 0.029411764705882353,
 'ROMWE': 0.17647058823529413,
 'ROUND': 0.029411764705882353,
 'RUB': 0.11764705882352941,
 'SAUAGE': 0.029411764705882353,
 'SEASONING': 0.17647058823529413,
 'SELECT': 0.029411764705882353,
 'SERUM': 0.029411764705882353,
 'SESAME': 0.14705882352941177,
 'SHAVED': 0.029411764705882353,
 'SHEIN': 0.47058823529411764,
 'SHOULDER': 0.029411764705882353,
 'SHOUSE': 0.029411764705882353,
 'SINGLE': 0.029411764705882353,
 'SIRLOIN': 0.029411764705882353,
 'SISTERS': 0.11764705882352941,
 'SMOKE': 0.029411764705882353,
 'SMOKEHOUSE': 0.029411764705882353,
 'SPLENDA': 0.2647058823529412,
 'STEAK': 0.08823529411764706,
 'STEVIA': 0.029411764705882353,
 'SUNNYVALE': 0.029411764705882353,
 'SWEATER': 0.029411764705882353,
 'TALISI': 0.029411764705882353,
 'TAZO': 0.029411764705882353,
 'TEA': 0.35294117647058826,
 'TEAVANA': 0.058823529411764705,
 'TERRA': 0.029411764705882353,
 'THREE': 0.11764705882352941,
 'TIME': 0.029411764705882353,
 'TISTORY': 0.058823529411764705,
 'TOP': 0.029411764705882353,
 'TRADER': 0.5294117647058824,
 'TREE': 0.029411764705882353,
 'TRUVIA': 0.029411764705882353,
 'TU': 0.029411764705882353,
 'UDACITY': 0.029411764705882353,
 'UNIT': 0.029411764705882353,
 'URBANA': 0.029411764705882353,
 'VERMON': 0.029411764705882353,
 'VIA': 0.08823529411764706,
 'VREW': 0.058823529411764705,
 'VUE': 0.029411764705882353,
 'WB': 0.029411764705882353,
 'WEI': 0.029411764705882353,
 'WHOLE': 0.20588235294117646,
 'WILLIAMS': 0.08823529411764706,
 'YOGI': 0.029411764705882353,
 'ZAFUL': 0.11764705882352941,
 'ZLXHWPSLRTODZMFLA': 0.029411764705882353,
 'ZZZQUIL': 0.17647058823529413,
 'name': 0.029411764705882353}
In [126]:
plt.figure(figsize=(10,10))
plt.imshow(wc_e, interpolation='bilinear')
plt.axis('off')
plt.show()

As you can see in the word clous, I can see the top 3 topics of User's Interests:

  • 1st topic was local information (such as, 'USA', 'Chicago', 'San Jose', 'LA', or 'Jamsil').
  • 2nd topic was consumer's reviews ('Recommdation', 'Review', 'Instant Pot', 'US grocery market', 'Trader Joe's', 'Shein').
  • 3nd topic was diet/health ('Keto', 'LCHF', 'Review', 'diet', 'stevia').
In [ ]: